In [2]:
from __future__ import print_function, division

import thinkdsp
import thinkplot
import thinkstats2

import numpy as np
import scipy.fftpack

import warnings
warnings.filterwarnings('ignore')

import dct

%matplotlib inline

Exercise: In this chapter I claim that analyze1 takes time proportional to $n^3$ and analyze2 takes time proportional to $n^2$. To see if that's true, run them on a range of input sizes and time them. In IPython, you can use the magic command %timeit.

If you plot run time versus input size on a log-log scale, you should get a straight line with slope 3 for analyze1 and slope 2 for analyze2. You also might want to test dct_iv and scipy.fftpack.dct.

I'll start with a noise signal and an array of power-of-two sizes

In [3]:
signal = thinkdsp.UncorrelatedGaussianNoise()
noise = signal.make_wave(duration=1.0, framerate=16384)
noise.ys.shape
Out[3]:
(16384,)
In [12]:
ns = 2 ** np.arange(6, 10)
ns
Out[12]:
array([ 64, 128, 256, 512])

The following function takes an array of results from a timing experiment, plots the results, and fits a straight line.

In [14]:
def plot_bests(bests):    
    thinkplot.plot(ns, bests)
    thinkplot.config(xscale='log', yscale='log', legend=False)
    
    x = np.log(ns)
    y = np.log(bests)
    t = scipy.stats.linregress(x,y)
    slope = t[0]

    return slope

Here are the results for analyze1.

In [15]:
results = []
for N in ns:
    print(N)
    ts = (0.5 + np.arange(N)) / N
    freqs = (0.5 + np.arange(N)) / 2
    ys = noise.ys[:N]
    result = %timeit -r1 -o dct.analyze1(ys, freqs, ts)
    results.append(result)

bests = [result.best for result in results]
plot_bests(bests)
64
695 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
128
2.29 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
256
6.35 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
512
21.2 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)
Out[15]:
1.6252403770538144

The estimated slope is close to 2, not 3, as expected. One possibility is that the performance of np.linalg.solve is nearly quadratic in this range of array sizes.

The line is curved, which suggests that we have not reached the array size where the runtime shows cubic growth. With larger array sizes, the estimated slope increases, so maybe it eventually converges on 3.

Here are the results for analyze2:

In [16]:
results = []
for N in ns:
    ts = (0.5 + np.arange(N)) / N
    freqs = (0.5 + np.arange(N)) / 2
    ys = noise.ys[:N]
    result = %timeit -r1 -o dct.analyze2(ys, freqs, ts)
    results.append(result)

bests2 = [result.best for result in results]
plot_bests(bests2)
138 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10000 loops each)
943 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
3.52 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
13.7 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
Out[16]:
2.180545403522745

The results for analyze2 fall in a straight line with the estimated slope close to 2, as expected.

Here are the results for the scipy.fftpack.dct

In [17]:
results = []
for N in ns:
    ys = noise.ys[:N]
    result = %timeit -o scipy.fftpack.dct(ys, type=3)
    results.append(result)

bests3 = [result.best for result in results]
plot_bests(bests3)
4.21 µs ± 8.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.78 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
5.43 µs ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
6.67 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Out[17]:
0.21758965837997682

This implementation of dct is even faster. The line is curved, which means either we haven't seen the asymptotic behavior yet, or the asymptotic behavior is not a simple exponent of $n$. In fact, as we'll see soon, the run time is proportional to $n \log n$.

The following figure shows all three curves on the same axes.

In [18]:
thinkplot.preplot(3)
thinkplot.plot(ns, bests, label='analyze1',color='red')
thinkplot.plot(ns, bests2, label='analyze2',color='blue')
thinkplot.plot(ns, bests3, label='fftpack.dct',color='black')
thinkplot.config(xscale='log', yscale='log', legend=True, loc='upper left')

Exercise: One of the major applications of the DCT is compression for both sound and images. In its simplest form, DCT-based compression works like this:

  1. Break a long signal into segments.
  2. Compute the DCT of each segment.
  3. Identify frequency components with amplitudes so low they are inaudible, and remove them. Store only the frequencies and amplitudes that remain.
  4. To play back the signal, load the frequencies and amplitudes for each segment and apply the inverse DCT.

Implement a version of this algorithm and apply it to a recording of music or speech. How many components can you eliminate before the difference is perceptible?

thinkdsp provides a class, Dct that is similar to a Spectrum, but which uses DCT instead of FFT.

As an example, I'll use a recording of a saxophone:

In [19]:
wave = thinkdsp.read_wave('music/193022__truthiswithin__tibetan-singing-bowl-struck.wav')
wave.make_audio()
Out[19]:

Here's a short segment:

In [27]:
segment = wave.segment(start=1.0, duration=0.5)
segment.normalize()
segment.make_audio()
Out[27]:

And here's the DCT of that segment:

In [40]:
seg_dct = segment.make_dct()
seg_dct.plot(high=8000)
thinkplot.config(xlabel='Frequency (Hz)', ylabel='DCT')

There are only a few harmonics with substantial amplitude, and many entries near zero.

The following function takes a DCT and sets elements below thresh to 0.

In [41]:
def compress(dct, thresh=1):
    count = 0
    for i, amp in enumerate(dct.amps):
        if abs(amp) < thresh:
            dct.hs[i] = 0
            count += 1
            
    n = len(dct.amps)
    print(count, n, 100 * count / n, sep='\t')

If we apply it to the segment, we can eliminate more than 90% of the elements:

In [42]:
seg_dct = segment.make_dct()
compress(seg_dct, thresh=10)
seg_dct.plot(high=8000)
21059	22050	95.50566893424036

And the result sounds the same (at least to me):

In [43]:
seg2 = seg_dct.make_wave()
seg2.make_audio()
Out[43]:

To compress a longer segment, we can make a DCT spectrogram. The following function is similar to wave.make_spectrogram except that it uses the DCT.

In [44]:
def make_dct_spectrogram(wave, seg_length):
    """Computes the DCT spectrogram of the wave.

    seg_length: number of samples in each segment

    returns: Spectrogram
    """
    window = np.hamming(seg_length)
    i, j = 0, seg_length
    step = seg_length / 2

    # map from time to Spectrum
    spec_map = {}

    while j < len(wave.ys):
        segment = wave.slice(i, j)
        segment.window(window)

        # the nominal time for this segment is the midpoint
        t = (segment.start + segment.end) / 2
        spec_map[t] = segment.make_dct()

        i += step
        j += step

    return thinkdsp.Spectrogram(spec_map, seg_length)

Now we can make a DCT spectrogram and apply compress to each segment:

In [46]:
spectro = make_dct_spectrogram(wave,seg_length=1024)
for t, dct in sorted(spectro.spec_map.items()):
    compress(dct, thresh=0.2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-611b485819ee> in <module>()
----> 1 spectro = make_dct_spectrogram(wave,seg_length=1024)
      2 for t, dct in sorted(spectro.spec_map.items()):
      3     compress(dct, thresh=0.2)

<ipython-input-44-e693e4c49795> in make_dct_spectrogram(wave, seg_length)
     14 
     15     while j < len(wave.ys):
---> 16         segment = wave.slice(i, j)
     17         segment.window(window)
     18 

~/dsp/1st homework/thinkdsp.py in slice(self, i, j)
    898         j: second slice index
    899         """
--> 900         ys = self.ys[i:j].copy()
    901         ts = self.ts[i:j].copy()
    902         return Wave(ys, ts, self.framerate)

TypeError: slice indices must be integers or None or have an __index__ method

In most segments, the compression is 75-80%.

To hear what it sounds like, we can convert the spectrogram back to a wave and play it.

In [10]:
wave2 = spectro.make_wave()
wave2.make_audio()
Out[10]:

And here's the original again for comparison.

In [11]:
wave.make_audio()
Out[11]:

As an experiment, you might try increasing thresh to see when the effect of compression becomes audible (to you).

Also, you might try compressing a signal with some noisy elements, like cymbals.